RFM Analysis

Estimated reading time: 8 minutes

RFM Analysis

This iPython notebook explains how to perform RFM analysis from customer purchase history data. The sample orders file is Sample - Superstore dataset from Tableau Software.

import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Read the sample orders file, containing all past purchases for all customers.

orders = pd.read_csv('sample-orders2.csv',sep=',')

orders.head()

	order_date	order_id	customer	grand_total
0	9/07/11	CA-2011-100006	Dennis Kane	378
1	7/08/11	CA-2011-100090	Ed Braxton	699
2	3/14/11	CA-2011-100293	Neil Französisch	91
3	1/29/11	CA-2011-100328	Jasper Cacioppo	4
4	4/08/11	CA-2011-100363	Jim Mitchum	21

Create the RFM Table

Since recency is calculated for a point in time and the Tableau Super Store dataset last order date is Dec 31 2014, that is the date we will use to calculate recency.

Set this date to the current day and extract all orders until yesterday.

import datetime as dt
NOW = dt.datetime(2014,12,31)

# Make the date_placed column datetime
orders['order_date'] = pd.to_datetime(orders['order_date'])

Create the RFM Table

rfmTable = orders.groupby('customer').agg({'order_date': lambda x: (NOW - x.max()).days, # Recency
                                        'order_id': lambda x: len(x),      # Frequency
                                        'grand_total': lambda x: x.sum()}) # Monetary Value

rfmTable['order_date'] = rfmTable['order_date'].astype(int)
rfmTable.rename(columns={'order_date': 'recency', 
                         'order_id': 'frequency', 
                         'grand_total': 'monetary_value'}, inplace=True)

Validating the RFM Table

rfmTable.head()

	recency	frequency	monetary_value
customer
Aaron Bergman	415	3	887
Aaron Hawkins	12	7	1744
Aaron Smayling	88	7	3050
Adam Bellavance	54	8	7756
Adam Hart	34	10	3249

Customer Aaron Bergman has frequency:3, monetary value:$887 and recency:415 days.

aaron = orders[orders['customer']=='Aaron Bergman']
aaron

	order_date	order_id	customer	grand_total
624	2011-02-19	CA-2011-152905	Aaron Bergman	13
665	2011-03-07	CA-2011-156587	Aaron Bergman	310
2336	2013-11-11	CA-2013-140935	Aaron Bergman	564

Inserting the date of Aaron purchase and comparing it to the recency in the rfmTable we verify our RFM table is correct.

(NOW - dt.datetime(2013,11,11)).days==415

True

Determining RFM Quartiles

quantiles = rfmTable.quantile(q=[0.25,0.5,0.75])

#25th, 50th, 75th percentile.
quantiles

	recency	frequency	monetary_value
0.25	30.0	5.0	1145.0
0.50	75.0	6.0	2257.0
0.75	183.0	8.0	3784.0

Send quantiles to a dictionary, easier to use.

quantiles = quantiles.to_dict()

quantiles

{'frequency': {0.25: 5.0, 0.5: 6.0, 0.75: 8.0},
 'monetary_value': {0.25: 1145.0, 0.5: 2257.0, 0.75: 3784.0},
 'recency': {0.25: 30.0, 0.5: 75.0, 0.75: 183.0}}

Creating the RFM segmentation table

rfmSegmentation = rfmTable

We create two classes for the RFM segmentation since, being high recency is bad, while high frequency and monetary value is good.

# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def RClass(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def FMClass(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1

rfmSegmentation['R_Quartile'] = rfmSegmentation['recency'].apply(RClass, args=('recency',quantiles,))
rfmSegmentation['F_Quartile'] = rfmSegmentation['frequency'].apply(FMClass, args=('frequency',quantiles,))
rfmSegmentation['M_Quartile'] = rfmSegmentation['monetary_value'].apply(FMClass, args=('monetary_value',quantiles,))

rfmSegmentation['RFMClass'] = rfmSegmentation.R_Quartile.map(str) \
                            + rfmSegmentation.F_Quartile.map(str) \
                            + rfmSegmentation.M_Quartile.map(str)

rfmSegmentation.head()

	recency	frequency	monetary_value	R_Quartile	F_Quartile	M_Quartile	RFMClass
customer
Aaron Bergman	415	3	887	4	4	4	444
Aaron Hawkins	12	7	1744	1	2	3	123
Aaron Smayling	88	7	3050	3	2	2	322
Adam Bellavance	54	8	7756	2	2	1	221
Adam Hart	34	10	3249	2	1	2	212

# Uncomment any of the following lines to: copy data to clipboard or save it to a CSV file.
# rfmSegmentation.to_clipboard()
# rfmSegmentation.to_csv('rfm-table.csv', sep=',')

Who are the top 5 best customers? by RFM Class (111), high spenders who buy recently and frequently?

rfmSegmentation[rfmSegmentation['RFMClass']=='111'].sort_values('monetary_value', ascending=False).head(5)

	recency	frequency	monetary_value	R_Quartile	F_Quartile	M_Quartile	RFMClass
customer
Sanjit Engle	9	11	12210	1	1	1	111
John Lee	21	11	9801	1	1	1	111
Pete Kriz	9	12	8647	1	1	1	111
Harry Marie	2	10	8237	1	1	1	111
Lena Creighton	16	12	7661	1	1	1	111

## Question, can the RFM class be automated using KNN - What clusters does it find.